Extending BLEU MT Evaluation Method with Frequency Weighting

نویسندگان

  • Bogdan Babych
  • Anthony Hartley
چکیده

We present the results of an experiment on extending the automatic method of Machine Translation evaluation BLUE with statistical weights for lexical items, such as tf.idf scores. We show that this extension gives additional information about evaluated texts; in particular it allows us to measure translation Adequacy, which, for statistical MT systems, is often overestimated by the baseline BLEU method. The proposed model uses a single human reference translation, which increases the usability of the proposed method for practical purposes. The model suggests a linguistic interpretation which relates frequency weights and human intuition about translation Adequacy and Fluency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending the BLEU MT Evaluation Method with Frequency Weightings

We present the results of an experiment on extending the automatic method of Machine Translation evaluation BLUE with statistical weights for lexical items, such as tf.idf scores. We show that this extension gives additional information about evaluated texts; in particular it allows us to measure translation Adequacy, which, for statistical MT systems, is often overestimated by the baseline BLE...

متن کامل

Extending MT evaluation tools with translation complexity metrics

In this paper we report on the results of an experiment in designing resource-light metrics that predict the potential translation complexity of a text or a corpus of homogenous texts for state-ofthe-art MT systems. We show that the best prediction of translation complexity is given by the average number of syllables per word (ASW). The translation complexity metrics based on this parameter are...

متن کامل

Weighted N-gram model for evaluating Machine Translation output

I present the results of an experiment on extending an automatic method of Machine Translation evaluation (BLEU) with weights for the statistical significance of lexical items. I show that this extension gives additional information about evaluated texts; in particular it allows us to measure translation Adequacy, which, for statistical MT systems, is often overestimated by the baseline BLEU me...

متن کامل

Measuring Confidence Intervals for the Machine Translation Evaluation Metrics

Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST metric, are becoming increasingly important in MT. This paper reports a novel method of calculating the confidence intervals for BLEU/NIST scores using bootstrapping. With this method, we can determine whether two MT systems are significantly different from each other. We study the effect of tes...

متن کامل

Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System?

Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST metric, are becoming increasingly important in MT. Yet, their behaviors are not fully understood. In this paper, we analyze some flaws in the BLEU/NIST metrics. With a better understanding of these problems, we can better interpret the reported BLEU/NIST scores. In addition, this paper reports a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004